Correlation modeling of MLLR transform biases for rapid HMM adaptation to new speakers
نویسندگان
چکیده
This paper concerns rapid adaptation of hidden Markov model (HMM) based speech recognizers to a new speaker, when only few speech samples (one minute or less) are available from the new speaker. A widely used family of adaptation algorithms defines adaptation as a linearly constrained reestimation of the HMM Gaussians. With few speech data, tight constraints must be introduced, by reducing the number of linear transforms and by specifying certain transform structures (e.g. block diagonal). We hypothesize that under these adaptation conditions, the residual errors of the adapted Gaussian parameters can be represented and corrected by dependency models, as estimated from a training corpus. Thus, after introducing a particular class of linear transforms, we develop correlation models of the transform parameters. In rapid adaptation experiments on the SWITCHBOARD corpus, the proposed algorithm performs better than the transform-constrained adaptation and the adaptation by correlation modeling of the HMM parameters, respectively.
منابع مشابه
Incorporating HMM-state sequence confusion for rapid MLLR adaptation to new speakers
In this paper, we introduce the HMM-state sequence confusion characteristics as prior knowledge into the framework of MLLR to relax the transformation and reduce the risks of over-training when adaptation data size is small. There are two issues to be addressed as follows: first, how to estimate such confusion information reliably; second how to use the information in refining the estimation of...
متن کاملRapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection
In this paper, we employ the concept of HMM-Sufficient Statistics (HMM-Suff Stat) and N-best speakers selection to realize a rapid implementation of Baum-Welch and MLLR. Only a single arbitrary utterance is required which is used to select the N-best speakers HMM-Suff Stat from the training database as adaptation data. Since HMM-Suff Stat are pre-computed offline, computation load is minimized....
متن کاملFormant-based frequency warping for improving speaker adaptation in HMM TTS
Vocal Tract Length Normalization (VLTN), usually implemented as a frequency warping procedure (e.g. bilinear transformation), has been used successfully to adapt the spectral characteristics to a target speaker in speech recognition. In this study we exploit the same concept of frequency warping but concentrate explicitly on mapping the first four formant frequencies of 5 long vowels from sourc...
متن کاملA novel target-driven MLLR adaptation algorithm with multi-layer structure
This paper presents a novel target-driven MLLR adaptation algorithm with multiply layer structure, which is based on the thorough analysis of MLLR using the generation of regression class trees. The new algorithm is constructed on the targetdriven principal. It generates the regression class dynamically, basing on the outcome of the former MLLR transformation. The regression classes is defined ...
متن کاملUnsupervised speaker adaptation based on sufficient HMM statistics of selected speakers
This paper describes an efficient method for unsupervised speaker adaptation. This method is based on (1) selecting a subset of speakers who are acoustically close to a test speaker, and (2) calculating adapted model parameters according to the previously stored sufficient HMM statistics of the selected speakers’ data. In this method, only a few unsupervised test speaker’s data are required for...
متن کامل